12 research outputs found
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Language-conditioned policies allow robots to interpret and execute human
instructions. Learning such policies requires a substantial investment with
regards to time and compute resources. Still, the resulting controllers are
highly device-specific and cannot easily be transferred to a robot with
different morphology, capability, appearance or dynamics. In this paper, we
propose a sample-efficient approach for training language-conditioned
manipulation policies that allows for rapid transfer across different types of
robots. By introducing a novel method, namely Hierarchical Modularity, and
adopting supervised attention across multiple sub-modules, we bridge the divide
between modular and end-to-end learning and enable the reuse of functional
building blocks. In both simulated and real world robot manipulation
experiments, we demonstrate that our method outperforms the current
state-of-the-art methods and can transfer policies across 4 different robots in
a sample-efficient manner. Finally, we show that the functionality of learned
sub-modules is maintained beyond the training process and can be used to
introspect the robot decision-making process. Code is available at
https://github.com/ir-lab/ModAttn.Comment: 2022 Conference on Robot Learning (CoRL
Theory of Mind for Multi-Agent Collaboration via Large Language Models
While Large Language Models (LLMs) have demonstrated impressive
accomplishments in both reasoning and planning, their abilities in multi-agent
collaborations remains largely unexplored. This study evaluates LLM-based
agents in a multi-agent cooperative text game with Theory of Mind (ToM)
inference tasks, comparing their performance with Multi-Agent Reinforcement
Learning (MARL) and planning-based baselines. We observed evidence of emergent
collaborative behaviors and high-order Theory of Mind capabilities among
LLM-based agents. Our results reveal limitations in LLM-based agents' planning
optimization due to systematic failures in managing long-horizon contexts and
hallucination about the task state. We explore the use of explicit belief state
representations to mitigate these issues, finding that it enhances task
performance and the accuracy of ToM inferences for LLM-based agents.Comment: Accepted to EMNLP 2023 (Main Conference
Explainable Action Advising for Multi-Agent Reinforcement Learning
Action advising is a knowledge transfer technique for reinforcement learning
based on the teacher-student paradigm. An expert teacher provides advice to a
student during training in order to improve the student's sample efficiency and
policy performance. Such advice is commonly given in the form of state-action
pairs. However, it makes it difficult for the student to reason with and apply
to novel states. We introduce Explainable Action Advising, in which the teacher
provides action advice as well as associated explanations indicating why the
action was chosen. This allows the student to self-reflect on what it has
learned, enabling advice generalization and leading to improved sample
efficiency and learning performance - even in environments where the teacher is
sub-optimal. We empirically show that our framework is effective in both
single-agent and multi-agent scenarios, yielding improved policy returns and
convergence rates when compared to state-of-the-art methodsComment: This work has been accepted to ICRA 202
Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models
Deception and persuasion play a critical role in long-horizon dialogues
between multiple parties, especially when the interests, goals, and motivations
of the participants are not aligned. Such complex tasks pose challenges for
current Large Language Models (LLM) as deception and persuasion can easily
mislead them, especially in long-horizon multi-party dialogues. To this end, we
explore the game of Avalon: The Resistance, a social deduction game in which
players must determine each other's hidden identities to complete their team's
objective. We introduce an online testbed and a dataset containing 20 carefully
collected and labeled games among human players that exhibit long-horizon
deception in a cooperative-competitive setting. We discuss the capabilities of
LLMs to utilize deceptive long-horizon conversations between six human players
to determine each player's goal and motivation. Particularly, we discuss the
multimodal integration of the chat between the players and the game's state
that grounds the conversation, providing further insights into the true player
identities. We find that even current state-of-the-art LLMs do not reach human
performance, making our dataset a compelling benchmark to investigate the
decision-making and language-processing capabilities of LLMs. Our dataset and
online testbed can be found at our project website:
https://sstepput.github.io/Avalon-NLU/Comment: Accepted to the 2023 Conference on Empirical Methods in Natural
Language Processing (EMNLP, Findings of the Association for Computational
Linguistics
Characterizing Out-of-Distribution Error via Optimal Transport
Out-of-distribution (OOD) data poses serious challenges in deployed machine
learning models, so methods of predicting a model's performance on OOD data
without labels are important for machine learning safety. While a number of
methods have been proposed by prior work, they often underestimate the actual
error, sometimes by a large margin, which greatly impacts their applicability
to real tasks. In this work, we identify pseudo-label shift, or the difference
between the predicted and true OOD label distributions, as a key indicator to
this underestimation. Based on this observation, we introduce a novel method
for estimating model performance by leveraging optimal transport theory,
Confidence Optimal Transport (COT), and show that it provably provides more
robust error estimates in the presence of pseudo-label shift. Additionally, we
introduce an empirically-motivated variant of COT, Confidence Optimal Transport
with Thresholding (COTT), which applies thresholding to the individual
transport costs and further improves the accuracy of COT's error estimates. We
evaluate COT and COTT on a variety of standard benchmarks that induce various
types of distribution shift -- synthetic, novel subpopulation, and natural --
and show that our approaches significantly outperform existing state-of-the-art
methods with an up to 3x lower prediction error
Learning Interactive Behaviors for Musculoskeletal Robots Using Bayesian Interaction Primitives
Musculoskeletal robots that are based on pneumatic actuation have a variety of properties, such as compliance and back-drivability, that render them particularly appealing for human-robot collaboration. However, programming interactive and responsive behaviors for such systems is extremely challenging due to the nonlinearity and uncertainty inherent to their control. In this paper, we propose an approach for learning Bayesian Interaction Primitives for musculoskeletal robots given a limited set of example demonstrations. We show that this approach is capable of real-time state estimation and response generation for interaction with a robot for which no analytical model exists. Human-robot interaction experiments on a \u27handshake\u27 task show that the approach generalizes to new positions, interaction partners, and movement velocities.IEEE/RSJ International Conference on Intelligent Robots and Systems (iROS2019), November 4 - 8, 2019, Macau, Chin